arXiv:1507.04199vl [stat.AP] 15 Jul 2015 


Submined to the Annals of Applied Statistics 


EVALUATING THE CAUSAL EFFECT OF UNIVERSITY GRANTS ON 
STUDENT DROPOUT: EVIDENCE FROM A REGRESSION 
DISCONTINUITY DESIGN USING PRINCIPAL STRATIFICATION 

By Fan Li^’* , Alessandra Mattei*’^ and Fabrizia Mealli^ 

Duke University^ and University of Florence^ 

Regression discontinuity (RD) designs are often interpreted as local ran¬ 
domized experiments: a RD design can be considered as a randomized ex¬ 
periment for units with a realized value of a so-called forcing variable falling 
around a pre-fixed threshold. Motivated by the evaluation of Italian univer¬ 
sity grants, we consider a fuzzy RD design where the receipt of the treat¬ 
ment is based on both eligibility criteria and a voluntary application sta¬ 
tus. Resting on the fact that grant application and grant receipt statuses are 
post-assignment (post-eligibility) intermediate variables, we use the principal 
stratification framework to define causal estimands within the Rubin Causal 
Model. We propose a probabilistic formulation of the assignment mechanism 
underlying RD designs, by re-formulating the Stable Unit Treatment Value 
Assumption (SUTVA) and making an explicit local overlap assumption for 
a subpopulation around the threshold. A local randomization assumption is 
invoked instead of more standard continuity assumptions. We also develop 
a model-based Bayesian approach to select the target subpopulation(s) with 
adjustment for multiple comparisons, and to draw inference for the target 
causal estimands in this framework. Applying the method to the data from 
two Italian universities, we find evidence that university grants are effective 
in preventing students from low-income families from dropping out of higher 
education. 


1. Introduction. Amid the recent economic crisis in Europe, there has been a heated 
debate on how to arrange college students financial support, especially in terms of the in¬ 
struments used, e.g., loans, grants, tuition waiver. Accurate evaluation of the effectiveness 
of the existing financial aid systems is crucial for providing information fo policy makers 
fo choose befween differenl insfrumenfs. In Ifaly sfafe universifies offer financial aid every 
year fo a limifed number of eligible freshmen. The main objective of fhis infervenfion is 
fo give equal opporfunify fo achieve higher education fo mofivafed sfudenfs irrespecfive of 
fheir economic background. Dropouf from universify is a relevanf phenomenon in Ifaly: 
indeed, fhe low rate of universify graduafes among Ifalian youfhs is mainly due fo fhe high 
dropouf rate (abouf 30%) rafher fhan fo a low enrollmenf rate. In this paper, we will investi- 
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gate the causal effects of Italian university grants on preventing students from low-income 
families from dropping out of higher education, using data on first-year enrollees from two 
state universities. 

In the Italian university system, only students who both meet a pre-fixed eligibility 
criteria and apply for a grant can receive the grant, consisting of tuition waiver, free 
meals and accommodation, and a limited amount of money around 3 000 Euros. The 
eligibility status depends on an economic measurement of the student’s family income 
and assets falling below or above a pre-determined threshold. This allocation rule mo¬ 
tivates us to adopt the regression discontinuity (RD) design framework for evaluation. 
RD design—a quasi-experimental design for causal inference—was first introduced in 
psychology by Thistlethwaite and Campbell (1960) and has became increasingly popular 
since the late 1990s in economics and other fields. Recent surveys can be found in Cook 
(2008); Imbens and Lemieux (2008); van der Klaauw (2008); Lee and Lemieux (2010). 
There are two general setups in RD designs, the sharp and the fuzzy RD designs. In the 
sharp RD design, the original form of the design, the treatment status is assumed to be a de¬ 
terministic step function of a so-called/orcmg variable or running variable. All units with 
a realized value of the forcing variable on one side of a pre-fixed threshold are assigned to 
one regime and all units on the other side are assigned to the other regime. The basic idea 
underlying a RD analysis is that one can compare units with very similar values for the 
forcing variable, but different levels of treatment, to draw inference on the causal effect of 
the treatment at the threshold. Examples of sharp RD designs can be found, among others, 
in Berk and de Leuuw (1999); Lee (2008); Mealli and Rampichini (2012). In the fuzzy RD 
design, the realized value of the forcing variable does not alone determine the receipt of 
the treatment, although a value of the forcing variable falling above or below the threshold 
acts as an encouragement or incentive to participate in the treatment. In those cases, the re¬ 
ceipt of the treatment depends also on individual choices, which may confound treatment 
receipt. Hahn, Todd and Van der Klaauw (2001) establish a connection between fuzzy RD 
designs and the instrumental variable (IV) settings, and show that in a fuzzy RD setting one 
can identify the local average treatment effect (Imbens and Angrist, 1994) for a subpopu¬ 
lation of compliers at the thi'eshold. Examples of fuzzy RD designs can be found, among 
others, in van der Klaauw (2002); Battistin and Rettore (2008); Garibaldi et al. (2012). 

The Italian university grant allocation rule defines a fuzzy RD design because not all 
eligible students get a grant, they must apply first, and application is voluntary. Also inel¬ 
igible students may apply, even if they will not receive any grant. Comparing to standard 
fuzzy RD designs where only assignment (eligibility) and receipt of the treatment (grant) 
are available, the additional data on the application status in this study can provide valu¬ 
able information with important policy implications. In this article, we will show how to 
capitalize on application. In particular, a main methodological contribution of this article 
is to develop a framework for RD analysis that is fully consistent with the Rubin Causal 
Model (RCM, Rubin, 1974, 1978) using potential outcomes. Resting on the fact that grant 
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application and grant receipt statuses are post-assignment (post-eligibility) intermediate 
variables, we adopt the principal stratification framework (Frangakis and Rubin, 2002) —a 
generalization of the IV approach to noncompliance (Angrist, Imbens and Rubin, 1996; 
Imbens and Rubin, 1997) —to define causal estimands and lay the basis for inference. 

In the literature, causal inference in RD designs is usually based on compaiisons of units 
with close but distinct values of the forcing variable and relies on smoothness assumptions 
about the relationship between outcomes and the forcing variable around the threshold, 
which imply randomization at the single threshold value. For example, in fuzzy RDs, esti¬ 
mands are usually specified as ratio of differences of regression functions at the threshold, 
and inference generally relies on asymptotic approximations (e.g. Imbens and Lemieux, 
2008). In real applications, large-sample approximations might be unreliable due to the 
small sample size, and exact inference would be preferable. RD designs have been of¬ 
ten described as designs that lead to locally randomized experiments around the thivsh- 
old (Lee, 2008; Lee and Lemieux, 2010; Dinardo and Lee, 2011). Expanding on this in¬ 
terpretation, a recent strand of the literature (e.g., Cattaneo, Frandsen and Titiunik, 2015; 
Sales and Hansen, 2014) is moving towards a formal and well-structured definition of the 
conditions under which RD designs are equivalent to local randomized experiments. 

We further develop the idea of local randomization; our goal is to provide a formal def¬ 
inition of the hypothetical experiment underlying RD designs, based on a description of 
the assignment mechanism, i.e., the process that describes why some units got assigned 
to different treatments, formalized as a unit-exchangeable stochastic function of covari¬ 
ates and potential outcomes. The core of our framework is to assume there exists at least 
one subpopulation around the threshold where a local overlap assumption holds. For this 
subpopulation we explicitly introduce a local randomization assumption. 

Though our framework is not tied to any mode of inference, we choose the Bayesian 
approach for reasons explained later. In particular, a second methodological contribu¬ 
tion of this article lies in our development of a Bayesian hierarchical modeling approach 
to adjust for multiple comparisons in selecting the target subpopulation(s). Our work 
contributes to the limited literature on Bayesian analysis to RD (Chib and Jacobi, 2011; 
Chib and Greenberg, 2014), as well as to the literature on Bayesian causal inference (e.g., 
Rubin, 1978; Imbens and Rubin, 1997; Barnard et ah, 2003; Elliott, Raghunathan and Ei, 
2010; Schwartz, Ei and Mealli, 2011; Mattel, Li and Mealli, 2013). 

In Section 2, we introduce the basic setup and the causal estimands. In Section 3, we 
propose a probabilistic formulation of the assignment mechanism for general RD designs, 
explicitly formulating the key assumptions, and elaborate it for the particular RD design 
used in the Italian university grants. Selection of the subpopulations where these assump¬ 
tions hold is also discussed. A Bayesian approach for inferring causal effects in RD designs 
is developed in Section 4. We then apply the proposed approach to evaluate causal effects 
of Italian university grants on student dropout in Section 5. Section 6 concludes. 
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2. Causal estimands. 

2.1. Basic setup. We introduce the notation in the context of Italian university grants. 
Let Z be the eligibility status, which is the initial assignment and plays the role of an 
“instrument” or an “encouragement” as in randomized experiments with noncompliance. 
Consider a sample or population of N units; each can be either eligible to receive a treat¬ 
ment, z = 1, or ineligible, z = 0. In the Italian grants system, eligibility depends on the 
value of a combined measurement of one’s assets including income and properties, ad¬ 
justed for family size, denoted by 5. If a student, satisfying preliminary grade criteria, has 
a value of S falling below a pre-determined threshold, e.g. = 15 000 euro, he/she is 
eligible, and not otherwise. That is, the eligibility status Z, for student / is a deterministic 
function of S: Z,- = 1(5/ < sq), where !(•) is the indicator function. Using the terminology 
in RD designs, S is the forcing variable. 

All variables measured after each unit i is assigned eligibility Z,, namely, the application 
status, the receipt of the grant and the dropout status, are post-assignment variables, and, 
in principle, eligibility may affect them. Thus we can define potenfial oufcomes for fhese 
variables: for each sfudenf i (i = 1,... ,N), given eligibility sfafus z (z - 0,1), lef A,(z) 
be an indicafor for fhe pofenfial granf application sfafus (equal fo 1 if sfudenf i applies for 
a granf and 0 ofherwise), Wi{z) be an indicafor for fhe potenfial frealmenf received (equal 
fo 1 if studenf i receives a granf and 0 ofherwise), and T;(z) be fhe pofenfial indicator for 
dropouf (1 if studenf i drops out of university, 0 otherwise). These notations, with only two 
potential outcomes for each post-treatment variable for each unit, reflect the acceptance of 
the Stable Unit Treatment Value Assumption (SUTVA, Rubin, 1980), which implies that 
there is no interference between units and that there are no levels of the eligibility status 
other than zero and one. A more explicit formulation of SUTVA will be introduced in 
Section 3.1. 

For each unit, i, given the observed eligibility status Z,-, the following variables are 
observed: = Ai{Zi), the observed application status; = 1T,(Z;), the observed 

treatment received; and = Yi{Zi), the observed dropout status. The remaining potential 
outcomes are missing: A"‘“ = A,(l -Z,), = VT;(1 -Z,), and = T;(l --Z,). A vector 

of p pre-treatment variables, X,-, is also observed for each unit. We use boldface upper-case 
letters to denote the vector of values of a variable for all units from hereon. For example, 
Z - (Zi,... ,Z^)', A°^^ = (Af^ ... ,A^^0'- 

2.2. The role of Principal Stratification for causal inference in Fuzzy RD designs. In 
the RCM, a causal effect is defined as a comparison of fhe pofenfial oufcomes T,(l) and 
T;(0), e.g., E[T,(1) - F,(0)], for a common set of units. Obviously, in our sfudy, such 
comparisons befween potenfial dropouf sfafuses only measure fhe effecf of fhe eligibility 
sfafus. To draw inference abouf fhe causal effecf of receiving a grant, addifional sfruc- 
fure and assumpfions are required. Since bofh fhe application sfafus and receipf of fhe 
granf are post-assignmenf infermediafe variables, we adopf fhe Principal Sfralificalion 
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(Frangakis and Rubin, 2002) framework. 

For each intermediate variable, principal stratification defines a cross-classificafion of 
subjects into groups, named principal strata, defined by fhe joint potential values of that 
intermediate vaiiable under each of the assignments being compared. In our study, based 
on the application status A, subjects are classified into four (latent) principal strata, G, = 

(A,(0), A,(l)): compliant-applicants G, = (0,1) = CA, students who would not apply if 
ineligible, but would apply if eligible; always-applicants G, = (1,1) = AA, students who 
would apply irrespective of their eligibility status; never-applicants G, = (0,0) = NA, 
students who would not apply irrespective of their eligibility status; and defiant-applicants 
Gi - (1,0) = DA, students who would not apply if eligible, but would apply if ineligible. 

Because principal strata are not affected by assignment, we can define population-average 
causal effects conditional on the principal strata, known as principal causal effects: 

(1) Tg = E[FK1) - >^,(0)|G,-= g], 

for g = AA, CA, NA, DA. Then the average causal effect of eligibility on dropout is a 
weighted average of these principal causal effects: 

E[FK1) - J^KO)] = 2 TTgTg, 

g=AA,CA,NA,DA 

where Hg is the proportion of units in principal stratum g. 

Never-applicants and defiant-applicants never receive a grant, so for them we always 
observe the outcome in the absence of the grant. By contrast, for always-applicants and 
compliant-applicants we can observe T,(l) for some eligible students who receive a grant 
and T;(0) for some other ineligible students who do not receive a grant. Therefore, always- 
applicants and compliant-applicants are the only groups where we can learn information 
about the effect of receiving a grant in this study, and thus the corresponding principal 
causal effects, taa and tqa, are the causal estimands of primary interest. 

In the standard IV approach to noncompliance (Angrist, Imbens and Rubin, 1996; Imbens and Rubin, 
1997) as well as standard setting of fuzzy RD designs (e.g., Imbens and Lemieux, 2008), 
data on application status is not utilized, either because it is not available or because it 
is ignored. Instead, the analysis is based on the principal strata formed by the interme¬ 
diate variable of grant receipt status. Specifically, there are four principal strata based on 
the joint potential grant receipt statuses, R, = (1T,(0)> W^r(l))' compliers R, = (0,1), stu¬ 
dents who would receive the grant if eligible and would not receive the grant if ineligible; 
always-takers R,- = (1,1), student would receive the grant regardless of eligibility; never- 
takers R,- = (0,0), student would not receive the grant regardless of eligibility; and defiers 
R, = (1,0), students who would not receive the grant if eligible and would receive the 
grant if ineligible. The focus is generally on the causal effect for compliers: 


T = E[T,(1) - F,(0) I Ri - (0,1)]. 
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We now establish the connection between these two sets of principal strata. The Italian 
grant assignment rule implies that W,(0) = 0 for all i, as ineligible units have no access to 
a grant, and VT;(1) = 0 if A,(l) = 0, as eligible units need to apply for a grant to receive 
a grant. Therefore, by design, there are no always-takers or defiers, and the remaining 
principal strata R’s can be expressed as unions of principal strata G’s: never-takers com¬ 
prise never-applicants and defiant-applicants, and compliers comprise always-applicants 
and compliant-applicants. As such, r can be rewritten as the weighted average of the causal 
effects for always-applicants and compliant-applicants: 

(2) T - E[T,(1) - yM I Gi G {AA, CA}] = + ^caTca _ 

^AA + ^CA 

This illustrates that principal strata defined by the application status leads to a finer parti¬ 
tion of the units than principal strata defined by the grant-receipt status. Indeed the stan¬ 
dard IV causal estimand—the causal effect for compilers r —^provides information on a 
‘marginal’ (with respect to application behavior) causal effect. If causal effects are homo¬ 
geneous, marginalizing over application behavior does not critically affect the evaluation 
analysis. Conversely, if causal effects are heterogeneous, as we have found in this study, 
ignoring application behavior represents a loss of useful information with potentially im¬ 
portant policy implications. For example, if the grants are found out to have a higher posi¬ 
tive effect on always-applicants than on compliant-applicants, then it would be useful and 
cost-effective to study the characteristics of ineligible applicants and include those into the 
eligibility rule to allocate additional resources. 

The estimands taa,tca and r represent effects of eligibility, rather than effects of the 
receipt of a grant. However, “the receipt of a grant” is completely confounded with “the 
eligibility status”: W{z) = z x A(z) = z for always-applicants and compliant-applicants. 

To attribute these effects to “the receipt of a grant”, below we can make an exclusion 
restriction assumption: 

Assumption 1. (Exclusion Restriction for Compliant-Applicants and Always-Applicants). 
For all units with G, G {AA, CA}, or equivalently Rj = (0,1), the effect of eligibility is only 
through the receipt of the grant. 

Assumption 1 attributes the intention-to-treat effect for compliers to the causal effect of 
the receipt of grant, rather than to its assignment (eligibility). A more formal version of 
this assumption, which requires double-indexed notations, is given in Imbens and Rubin 
(2015) (Chapter 23, Assumption 23.4). This type of exclusion restriction is routinely made, 
often implicitly, in randomized experiments with full compliance (Mealli and Rubin, 2002; 
Mealli and Pacini, 2013; Imbens and Rubin, 2015). 

In real studies, the sample-average counterpart of the population-average estimands 
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may also be of interest: 

(3) 



2] [7,(1) - J^KO)], 

i-Gi=g 


where g - AA, CA, {AA, CA} and Ng is the number of units in stratum g. Usually the 
sample-average elfects can be estimated more precisely than their population-average 
counteiparts. The subtle dilference between them in Bayesian inference is explained in 
Section 4. More details can be found, for example, in Rubin (1978); Imbens and Rubin 
(1997) and Imbens (2004). For simplicity of notation, we do not make the distinction 
between population-average and sample-average estimands in the methodological discus¬ 
sion, but will present both estimates in the application. 


3. The basis for inference. 


3.1. Probabilistic treatment assignment mechanism in RD designs. The complex se¬ 
lection process in Italian university grants system implies that the mechanism governing 
the receipt of the grant, which depends on both institutional and individual choices, is 
not ignorable. Below we introduce a probabilistic assignment mechanism underlying the 
RD design considered here, which is also applicable to general RD settings with minor 
modifications. 

We first define the assignment mechanism, which is a row-exchangeable function that 
assigns probabilities to all 2^ possible 77-dimensional vectors of assignments Z, as a row- 
exchangeable function that assigns probabilities to all possible 77-dimensional vectors of 
realizations of the forcing variable, S, above or below the threshold value, ^o- Formally, 

(4) Pr (Z - z|A(0), A(l), W(0), W(l), Y(0), Y(l), X) 

= Pr (S G A|A(0), A(l), W(0), W(l), Y(0), Y(l), X), 
where z £ {0,1}^ and A e |(-oo, (-oo, x (sq, oo), (sq, oo) x (-oo, . •. , 

(- 00 , 5o] X (so, oo)^“', (-CX5, X (-^O) oo)^|. Since Z is a deterministic function 

of S , the assignment mechanism can be formulated with respect to either Z or S. Here we 
prefer S because it is the underlying random variable that describes the reasons for the 
missing and observed values of potential outcomes: a value of S is assigned, which in turn 
determines a value for Z. 

Statistical inference for causal effects requires assumptions on the assignment mecha¬ 
nism. We introduce assumptions that allow us to describe RD settings as classical random¬ 
ized experiments around the threshold. The assignment mechanism in Equation (4) is a 
classical randomized experiment if (/) it is individualistic: 

Pr(S £ A|A(0),A(1), W(0), W(l), Y(0),Y(1),X) 

n 

= f] PriSi < ^o|A,(0),A,(l), WiiO), W,(l), U(0), U(1),X,); 

1=1 
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(//) it is probabilistic, which implies that for each unit, i, both events Si < sq and Si > sq 
have a priori a non-zero probability of occurring; (///) it is unconfounded, that is, free of 
dependence of any potential outcomes; and (iv) it is a known function of its arguments. 

The particular assignment rules underlying RD designs suggest that these assumptions 
are more reasonable for subpopulations of units who have a relatively large probability 
that the realized values of the forcing variable fall in a neighborhood around the threshold. 
So- For these subpopulations, we can reasonably assume that the distribution of the forcing 
variable is unrelated to observed and unobserved characteristics of students. On the other 
hand students with a very small (close to zero) or a very large (close to one) probability 
that Si < So are likely systematically different from other students. For example, poten¬ 
tial outcomes observed for very rich students, who do not receive any grant, are plausibly 
different from potential outcomes for poor students with a realized value of S around the 
threshold, who do not receive a grant, and vice versa. Therefore we focus on subpopula¬ 
tions of students who have a probability that S i < so strictly between zero and one, and 
sufficiently far away from zero and one. The following assumption guarantees that at least 
one such subpopulation of units exists. 

Assumption 2. (Local overlap). Let LI be the random sample (or population) of units 
in the study. There exists a subset of units, Lls^^, such that for each i € LIsq, Pr(5',' < 5o) > f 
and PrfS'; > ^o) > ^for some sufficiently large e > 0. 

Assumption 2 assumes that there exists a subpopulation of units, each of whom has a 
non-zero probability of being assigned to either treatment levels. This represents a main 
distinction between our framework and the existing RD literature that often describes RD 
designs as settings where the overlap assumption is violated. Now within the subpopula¬ 
tion LIsq we can formally introduce a modified SUTVA specific to the RD settings: 

Assumption 3. (Local RD-SUTVA). For each i € Lls^^, consider two eligibility statuses 
Z'. - 1(5^ < 5o) tind Z" = 1(5'" < ^o). with possibly 5- t S”. IfZ'. = Z", that is, if either 
S ■< So and S ■ < 5o. or 5 ■ > sq and 5 ■ > sq, then A,(Z ) = A,(Z ), WfZ ) = WfZ ), 
and YfZ') = YfZ"). 

Local RD-SUTVA rules out interference between units, implying that potential out¬ 
comes for a student cannot be affected by the eligibility status of other students. Local 
RD-SUTVA also assumes that there are no levels of the eligibility status other than zero 
and one. This component of RD-SUTVA implies that values of the forcing variable lead¬ 
ing to the same eligibility status cannot alter potential outcomes for any unit, and thus 
allows us to avoid defining potential outcomes as functions of the forcing variable. Under 
the local RD-SUTVA for each unit within Usa there exist only two potential outcomes for 
each post-assignment variable, corresponding to the realized value of the forcing variable 
falling below and above the thi'eshold, respectively. 
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Finally, we need to formalize the concept of RD design as local randomized experiment: 
in a neighborhood of the threshold the forcing variable does not depend on either the 
potential outcomes or pre-treatment variables. Formally, we have: 

Assumption 4. (Local randomization). For each i e Idso’ 


Pr(5;|A,(0),A;(l), WKO), WiW, 7,(0), F,(1),X,) = Pr(5;). 


Assumption 4 states that within the subpopulation Fiso a Bernoulli trial has been con¬ 
ducted, with individual assignment probabilities, that is, the individual probabilities of 
being eligible to receive a grant, depending only on the distribution of the forcing variable: 
Pr(Z,- = 1) = PrCS', < 5o)- This assumption is crucial in justifying the key idea underlying 
any RD design. It implies that the eligibility statuses are randomly assigned in some small 
neighborhood, around 50- 

Assumption 4 may not always be plausible. For instance, when the forcing variable 
is a deterministic variable, which conceptually cannot be interpreted as a random vari¬ 
able with a non-degenerate probability distribution (such as time), the underlying design 
cannot, in general, be interpreted as a local randomized experiment (see Section 6.3 in 
Lee and Lemieux, 2010, pp 347). 

There are subtle but substantive differences between local RD-SUTVA and local ran¬ 
domization. Local RD-SUTVA is an exclusion restriction assumption and it is required to 
make the representation of potential outcomes as functions of the eligibility status ade¬ 
quate. Local randomization is an independence assumption and it is crucial to make infer¬ 
ence. RD-SUTVA is different from independence assumptions: it does not imply that the 
probability that we observe a value of the forcing variable above or below the threshold 
does not depend on potential outcomes. RD-SUTVA simply implies that the exposure to 
assignment level z specifies well-defined pofenfial oufcomes, for all unif / and assignmenf 
levels z. In ofher words, considering pofenfial oufcomes as random variables, RD-SUTVA 
does nof imply fhaf pofenfial oufcomes have fhe same disfribufion for each value of fhe 
forcing variable. In order fo make fhe forcing variable independenf of pofenfial oufcomes, 
we need fo infroduce addifional assumpfions, such as Assumption 4. 

Following Assumption 2, we can define a local version of fhe large! esfimands: 


(5) 


^ E [T,(l) - TKO) I Gi = g,i€ 'W.J , 


for g - AA, CA, {AA, CA} and th&k finite-sample counferparls, and we have: 


T{AA,CAIso = T so 


TAA.sq^AA.sO + Tca,sq^CA,sQ 
^AA.sq + ^CA.sq 


where ng ^o - Pr(G; = g|/ € 'Z/^g) for g = AA, CA, NA, DA, denote fhe proportion of 
principal sfrafa in fhe subpopulation 'ZZ^g. A special case of 'ZZ^g confains fhe subpopulafion 
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of units with a realized value of the forcing variable exactly equal to the threshold value, 
Sa¬ 
lt is worth noting that Assumption 4 implies that 


E [7,(1) - >^,(0) I Gi = g,i€ = E [y,(l) - YiiO) I Z,- = l,Gi=g,i€ 'l/.J . 

Under the allocation rule of the Italian university grants, Z,- = for always-applicants 
and compliant-applicants. Therefore, the local randomization assumption allows the es- 
timands taa,so^ tcA,sa^ to be interpreted as causal effects of receiving a grant for 

subpopulations of students who actually receive a grant, analogous to the notion of average 
treatment effect for the treated. 

3.2. Two additional assumptions. The following two assumptions—though not nec¬ 
essary for Bayesian inference—are plausible in our study and can sharpen the inference. 

Assumption 5. Monotonicity of Application Status: 

A,(l) > A;(0), for all i e 

Assumption 6. Stochastic Exclusion Restriction for Never-Appiicants: 


Pr(T,(1)1^1 - NA, i G EIJ - Pr(F;(0)|G,- = NA, i e 


Monotonicity rules out the existence of defiant-applicants. The exclusion restriction 
rules out direct effects of eligibility on dropout for never-applicants. Never-applicants are 
students who would never apply for a grant irrespective of their eligibility status. These 
students would not receive the grant in any case. Exclusion restriction for never-applicants 
(Assumption 6) is of very different nature from the exclusion restriction for compliant- 
applicants and always-applicants (Assumption 1): Assumption 6 has implications for in¬ 
ference but not for interpretation, whereas Assumption 1 is made solely for interpreting the 
causal effects of assignment on the outcome attributable to the causal effects of treatment 
on the outcome. More discussions on the difference can be found in Imbens and Rubin 
(2015, Chapter 23) and Mealli and Pacini (2013). 

3.3. Selection of the subpopulations. An important issue in practice is the selection of 
the subpopulation IAsq where the RD assumptions hold. There can be a diverse choice of 
the shape of the subpopulation. In this paper, we limit our choice to symmetric intervals 
with respect to ^'o^ for convenience and also to match the common practice of RD analysis. 
Specifically, we make the following assumption: 

Assumption 7. There exists h > 0 such that for each e > 0, Pr(5o - h<Si<so + h)> 
1 - €, for each i £ Usq. 
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Assumption 7 allows us to focus on the specific subsets of symmetric intervals among 
all neighborhoods of different shape around the threshold, sq. Note that Assumptions 2 
and 7 do not imply that is unique. They only require that there exists at least one 
subpopulation, 14s^- Consequently, we are not interested in finding fhe largesf h, buf we 
only aim af determining plausible values for h. 

Our approach for selecting bandwidfh h exploifs fhe facf fhaf Assumption 4 is a local 
randomization assumption, in fhe sense fhaf if holds for a subsef of unifs, buf may nol 
hold in general for ofher unifs. As such, under Assumption 4, in fhe subpopulafion 14sq, 
pre-freafmenf variables should be well balanced in fhe fwo subsamples defined by assign- 
menf, and fhus any fesf of fhe null hypofhesis of no effecl of assignmenf on pre-freafmenf 
covariafes should fail fo rejecf fhe null. 

Assessing balance in fhe observed covariafes raises problems of mulfiple comparisons, 
which may lead fo a much higher fhan planned fype I error if fhey are ignored (e.g., 
Benjamini and Hochberg, 1995). We accounf for multiplicities using a Bayesian hierarchi¬ 
cal mixed model, which provides an explicif mefhod for borrowing informalion across co¬ 
variafes (e.g.. Berry and Berry, 2004; Scoff and Berger, 2006). Following Berry and Berry 
(2004), we use a mixfure for fhe prior disfribufion of fhe eligibilify pai'amefers by assign¬ 
ing a poinf mass on equably of fhe means of fhe covariafes belween eligible and ineligible 
unifs. This Bayesian procedure provides a measure of fhe risk (posterior probabilily) fhaf a 
chosen inferval around fhe fhreshold, ^'o^ defines a subpopulafion of unifs fhaf does nol ex- 
acfly malches any Irue 14 sq, including subjecls for which our RD assumptions do nol hold. 
More delails are given in Section 5. The idea fo exploil balance lesls of pre-assignmenl 
variables fo selecf a subpopulafion of unifs is also used in Callaneo, Frandsen and Tifiunik 
(2015), buf Iheir approach aims af selecling fhe largest subpopulafion and does nof accounf 
for mulfiple comparisons. 

Our approach parallels more conventional RD approaches based on local polynomial re¬ 
gression, which also involve bandwidfh selecfion, buf for a very differenl objecfive, namely 
finding an optimal balance belween precision and bias af fhe fhreshold for local polynomi¬ 
als (e.g., Ludwig and Miller, 2007; Lee and Lemieux, 2010; Imbens and Kalyanaraman, 

2012) , whereas fhe objecfive in our framework is fo find a subpopulafion where our RD 
assumptions are plausible and fhe selected subpopulafion defines fhe largel population. 

3.4. Mode of inference. Once fhe subpopulafion 14 sq is chosen, and under fhe RD as¬ 
sumptions 2-4, one can choose differenl modes of inference for fhe fargel causal esfimands, 
as in fhe large lileralure of principal slralificalion. For example, under fhe additional As¬ 
sumptions 5 and 6, fhe average causal effecl for compilers, , is non-paramelrically poinf 
identified and could be eslimaled using slandard momenl-based (inslrumenlal variable) 
mefhods. Buf fhe average causal effecls for always-applicanls and complianl-applicanls, 
taa.sq and tca.sq can be only non-paramelrically parlially identified (Mealli and Pacini, 

2013) . One can also use likelihood approaches fo paramelrically eslimale causal effecls 
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(e.g., Frumento et al., 2012; Mercatanti, 2013). Randomization-based inference (Fisher, 
1925), as in Cattaneo, Frandsen and Titiunik (2015), could also be adopted. 

In this article, we choose the Bayesian approach for inference for the following reasons. 
First, causal inference in RD designs usually involves complex observational data, with 
multiple sources of uncertainties, including the missing potential outcomes; the Bayesian 
approach is particularly useful for accounting for uncertainties and for pooling information 
from the data in such complex settings. Second, RD analysis usually relies on a sample 
of units with values of the forcing variable close to a single point, the size of which may 
be small; Bayesian methods, not relying on asymptotic approximations, are attractive in 
dealing with small samples. Third, in the Bayesian paradigm, the missing potential out¬ 
comes are treated as random variables, and all inferences are based on the posterior dis¬ 
tributions of causal estimands, which are functions of potential outcomes. Thus inference 
about finite-sample and super-population estimands can be drawn using the same inferen¬ 
tial procedures. Finally, pre-treatment variables can be easily incorporated in the Bayesian 
approach, which may improve efficiency of the analysis, i.e., reduce posterior variability. 

4. Bayesian inference. Our development of the Bayesian approach builds on the sem¬ 
inal works of Rubin (1978) and Imbens and Rubin (1997). Below we give a brief outline 
for conduction principal stratification analysis using a Bayesian approach; the readers may 
refer to the existing literature for more details (e.g., Elliott, Raghunathan and Li, 2010; 
Schwartz, Li and Mealli, 2011; Mattel, Li and Mealli, 2013). Throughout the discussion, 
we use p(-| ) and 6.\. to denote generic conditional distributions and the corresponding pa¬ 
rameters, respectively. 

Nine quantities are associated with each unit: T,(0), T,(l), lTi(0)^ A,(0), A,(l), X,-, 

Zi, Si- Among these. Si completely determines Z,-; the principal stratum G,- = (A,(0), A,(l)) 
and Si completely determine (1T,(0), 1T,(1))- Therefore, inference for causal effects in¬ 
volves only T,(0), T,(l), A,(0), A;(l), X,-, Si, of which four are observed: Si, X,-, A"*'* = 
AiiZi), Yf^ = YiiZi), and two are unobserved: Af" - A;(l - Z,), Yf^ = T;(l - Z,). 

Bayesian inference considers the observed values to be realizations of random variables 
and the unobserved values to be unobserved random variables. Let p(Y(0), Y(l), A(0), A(l), 
X, S;'Z7i0) denote the joint probability density function of these random variables for all 
units in HJso- We assume this distribution is unit-exchangeable, that is, it is invariant under 
a permutation of the unit indices. Then, with essentially no loss of generality, by appealing 
to de Linetti’s theorem (de Linetti, 1963), we can assume that there exists an unknown pa¬ 
rameter vector 6, which is itself a random variable having a known prior distribution p{6) 
such that: 

p(Y(0),Y(l),A(0),A(l),X,S;'l/,J- f Y\ p(T,(0), T;(l),AdO),AKl),X,-,5,|0)p(0)ri0. 


Bayesian inference of the causal estimands, which are functions of T,(z)’s and A,(z)’s, 
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centers around deriving the posterior distribution for the parameter vector of their distri¬ 
bution, denoted by 6y,g- Under Assumption 4, and assuming the parameters governing the 
distributions of the covariates, the forcing variable, and the potential outcomes are a priori 
distinct and independent, the posterior distribution of can be written as follows: 

(6) p (0y,G|Y"^^ X, S; a: p{eY\G) X p(0g)x 

n [ rr p{yimymGi,^i\eY\G)p{Gi\Xi\eG)dYf^ 

The above decomposition suggests that two models need to be specified for model-based 
inference: (1) the model for potential outcomes conditional on principal strata and covari¬ 
ates, and (2) the model for principal strata conditional on covariates, as well as the prior 
distribution for the parameters, p(0yG)> with OyG = (%> ^tig)- 

Let TTi^g = Pr(G; ^ g|X,;0G) and = piYi{z)\Gi = g,X,;6»y|G). Then the posterior 
distribution of 6y,g given the observed data can be written as follows: 

(7) 7(0yG|Y"^^A"^^X,S;'^Y,J 

PiOyG) X {^UCAfuCAP + ^i,NAfi,NA) X ^i,AAfi,AA,0 

/eT/jg :5, >io A f" =0 ie^^,g:Si>so,Af^=l 

X ]~~[ ^UNaAnA X i^i,AAfi,AA,l + ^i,CAfi,CA,l) , 

ie'U.g.S i<SQ,AA=0 re'WjQ:5;<so,Af'=l 

where Ana - Anap = Ana,i by the exclusion restriction (Assumption 6). The likeli¬ 
hood function, specified by the four products, does not depend on the association between 
the potential outcomes T,(0) and T;(l). Therefore the posterior distribution of the associ¬ 
ation parameters equal their prior distribution as long as the association parameters are 
a priori independent of the other parameters, as we assume henceforth. The population- 
average causal estimands taa,so-> '^ca,so-> '^so ^c functions of the parameter vector 6y,g^ 
which is free of the association parameters, therefore inference for them does not involve 
the association parameters (also see discussion in Imbens and Rubin, 1997). Inference for 
sample-average causal estimands for the units in the study, on the other hand, do generally 
involve the association parameters. In our application inference for sample-average causal 
estimands is drawn under the assumption that for each unit i, potential outcomes, T,(0) and 
F;(l)> are independent conditional on X, and 6. 

Posterior inference of 6y,g can be obtained using Gibbs sampling with a data augmenta¬ 
tion step to impute the missing A"“\ iteratively drawing from the two posterior predictive 
distributions, p (6»yG|Y'’*^ A'’*^ A""'",X, S; and p A'’*^X,S, Oys ; ^so)- 

Specification of tt, g, and corresponding prior to posterior computation depends on 
the specific application. Details of the models and computation in our application will be 
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provided in Section 5. As a general guideline, we recommend to specify tt,-^ and con¬ 
ditional on both covariates X and the forcing variable S, even though Equation (6) suggests 
conditioning on S is not required. Indeed, if the true subpopulations 'Uso were known, in 
theory, we would not need to adjust for S, because local randomization guarantees that for 
units in '2/^^ values of the forcing variable falling above or below the thr'eshold are inde¬ 
pendent of the potential outcomes. However, in practice, the true subpopulations Usa are 
usually unknown and once a subpopulation has been selected, that is, once a value for h, 
say h*, has been chosen, there may be some units with a realized value of S between sq-Ii* 
and sq + h* who do not belong to '2/^^. For these units there may be a relationship between 
the forcing variable and potential outcomes, and these potential dependences need to be 
modeled. Specifically, systematic differences in the forcing variable S that, by definition, 
occur between eligible and ineligible units, may affect inference in the presence of students 
who do not belong to Ksq- 

5. Evaluation of Italian university grants. 

5.1. Data. We apply the proposed method to the data from the cohort of first-year stu¬ 
dents enrolled in 2004 to 2006 at University of Pisa and University of Florence. For each 
student, information on grant application status grant receipt status at the 

beginning of the academic year, dropout status at the end of the academic year, and covari¬ 
ates (X,) is obtained from ministry of education and university administrative records. The 
forcing variable 5 is a combined economic measure of each student, calculated from one’s 
income tax return and property adjusted for family size based on a formula that is typically 
not fully known to the students. In all three years, the threshold of eligibility is the com¬ 
bined economic measure of a student below 15 000 euros. Thus, the eligibility status (Z) is 
also observed. Typically, students need support from fiscal experts to compute their value 
of S , and the income revenue authority conducts random inspections to verify that the of¬ 
ficial tax return were reported. These factors make extremely difficult, if not impossible, 
for students or students’ families to manipulate the value of S in order to end up on the 
right side of the threshold. Therefore we argue that the local randomization assumption is 
reasonable here. Ineligible students apply either usually because they are not fully aware 
of their eligibility status, or because they hope that their application will be still considered 
because of extra funding or other considerations. 

Covariates include sex, high school grade, high school type (4 categories), major in 
university (6 categories), indicator of year of enrollment (2004, 2005, 2006) and indicator 
of university (Pisa vs. Florence). Note that the data only include students who had a high 
school grade of at least 70/100 and applied either for a grant or for a reduction of tuition 
fee. Summary statistics of important variables for the students with the combined eco¬ 
nomic measure S within 1000 euros of the threshold are given in Table 1. An unadjusted 
comparison would suggest that the applicants have higher high-school grades, which is 
an important indicator of a student’s academic performance, but also higher dropout rate 
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regardless of their eligibility status. 

Application rate and dropout rate as a function of S among the students are given in 
Figure 1. The overall dropout rate is high, consistently between 30% to 50% regardless of 
the economic measure. From the fitted lines using local logistic polynomial models with 
order 3 on the two sides of the threshold, discontinuity is clearly visible in both application 
rate and dropout rate at the threshold. As the economic measure increases, application rate 
steadily decreases, while the trend in dropout rate has a concave change at the threshold, 
increasing on the left of the threshold and decreasing on the right. 

Table I 

Summary statistics of the first-year students enrolled in 2004 — 2006 at Universities of Pisa and Florence, for 


the students with S , 

€ (14000,16000) euros (i.e., 

h = 1 000, So 

= 15 000). 


Z 

= 0 

Z = 

1 

Variable 

A""' = 0 

A"'" = 1 

A"** = 0 

A""' = 1 

Sample Size 

657 

304 

703 

444 

Dropout 

0.36 

0.50 

0.35 

0.36 

5 (euros) 

15 495 

15 509 

14504 

14499 

Female 

0.59 

0.61 

0.60 

0.55 

HS Grade 

80.80 

84.35 

80.17 

84.47 

University (Pisa) 

0.37 

0.51 

0.37 

0.51 


5.2. Selection of the subpopulation. We apply the Bayesian approach to multiple test¬ 
ing discussed in Section 3.3 to find subpopulafions of unifs where our RD assumpfions 
hold. Specifically we use a hierarchical Bayesian model for assessing fhe balance of fhe 
covariafes befween eligibilify groups. We specify probif models for binary variables; con- 
difional probif models for categorical variables and Gaussian models for confinuous vari¬ 
ables. Formally, we assume fhaf Xj ~ Niyoj yijZi, aj) if Xj is confinuous, and Pr(3f,y = 
1) = Pr{X*j > 0) wifh X*j ~ N (y^j -t- yijZi, 1^, if Xj is binary. If Xj is a cafegorical vari¬ 
able faking on K values we assume fhaf Pr(A,y = 1) = Pr(A*j'^ < o), and Pr(X,y = k) = 
Pr (n^:} {X*p > 0} n X*p <o)for k = 2,...,K - 1, where ~ N (y® -t yfjZi, l), 

k = independenfly. Lef yoj = yfj~^^) and yiy = {y^^j ,Ty 

We specify fhe following prior disfribufions for fhe model parameters. The variances 
of fhe confinuous variables have an inverse-Gamma disfribufion: cr^ ~ /G(a, ^). The yo’s 
have Gaussian prior disfribufions: for confinuous and binary variables, yoy ~ cr^^), 

and for cafegorical variables, yoy ~ Af/iynWjf_i,wifh uk-i and 1k-i being fhe 
K - 1-dimensional vecfor of ones and fhe identify mafrix of order K - I, respecfively. 
Furfher, for confinuous and binary variables, paramefers yiy are fhe difference befween 
means/proporfions for eligible and ineligible unifs. If yi / = 0 fhen Xj has fhe same disfri¬ 
bufion for eligible and ineligible unifs. For a cafegorical variable faking on K values, fhe 
proportion of unifs in each category is fhe same for eligible and ineligible unifs if and only 
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Fig 1. Application rate (a) and dropout rate (b) as a function of the forcing variable for the first year students in 
Universities of Florence and Pisa in 2004—2006. The smoothed lines are estimated using polynomial logistic 
regression models (of order 3) on each side of the threshold separately; each point are calculated from the 
units within a binwidth of 50 euros. 




(a) Application rate 


(b) Dropout rate. □ - Non-Applicants, •- Applicants 


if yfj = 0 for each k - We assign positive probability to these possibilities 

using the following mixture prior disttibutions: 

71 j ~ n-y, doCny) + (1 - ^ri W(/^ri ’ ^ri ^ 


and 

K-l 

riy ~ n [^ri^o(rfy) + (1 - , 

k=l 

where do(0 is the Dirac delta distribution. 

For the hyperparameters, we assign the following prior distributions: ~ N(fi , 

We implement the Bayesian model for assessing the balance of covariates on the two 
sides of the threshold for various subpopulations defined by different values of h. Details 
on the Monte Carlo Markov Chain (MCMC) for the posterior computation are relegated to 
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Web Supplementary Material. Table 2 shows the posterior probabilities that the covariates 
have the same distribution between eligible and ineligible students for the subpopulations 
defined hy h ^ 250, 500, 750, 1000, 1500, 2 000, 2 500, 3 000, 4 000, 5 000. These values 
show that the probability of the pre-assignment variables being well balanced is high for 
subpopulations defined by values of h sfricfly lower fhan 1 500: fhe vasf majorify of fhese 
probabilities are larger fhan or close fo 0.8. Nofe fhaf fhe probabilifies are in general lower 
among fhe covariafes of “major in universify”, suggesting fhese covariafes may nof be as 
balanced as ofher covariafes. Nonefheless, nearly all fhese probabilifies are sfill higher 0.6 
with a single lowest probability being 0.565 (Tech major in university). For larger subpop¬ 
ulations some covariates, such as the “indicator of university,” are clearly unbalanced. 

Given that the risk that a chosen interval around the threshold defines a subpopula- 
fion fhaf includes unifs nof belonging fo fhe largel subpopulafion, is nof zero, in 
order fo accounf for fhe presence of fhese unifs, we conducf fhe subsequenf analyses 
condifioning on bofh covariafes and fhe realized values on fhe forcing variable. Also 
we evaluafe fhe robusfness of our resulfs conducfing analyses using various values of h 
{h = 500,1000,1 500) 

5.3. Parametric models. For fhe unifs wifhin fhe selecfed subpopulafion we as¬ 
sume paramefric models for fhe oufcome {fg^) and principal sfrafa (tt^). Alternative mod¬ 
els, such as Sfudenf-t models (Chib and Jacobi, 201 1) and Bayesian nonparamefric models 
(Schwarfz, Li and Mealli, 2011), can be considered. Nofe fhaf alfhough we are using para¬ 
mefric models, idenfificafion does nof rely on pai'amefric assumpfions. The model for fhe 
principal sfrafa of applicafion consisfs of fwo condifional probif models: 


7r;,AA = Pr(G;(AA) < 0), 

Tti,NA = Pr(G*(AA) > 0andG;(AA) < 0), 

^i,CA = 1 - T^iAA - ^i,NA, 


where 




wifh CAA.i ~ A^(0,1), €NA,i ~ 77(0,1) independenfly, and S* - (Si - 5'o)/1000. 

Dropouf, fhe primary oufcome in our applicafion, is binary. Therefore, we assume fhe 
following generalized linear oufcome model wifh a probif link (Alberf and Chib, 1993): 



We impose prior equably of fhe slope coefficienfs in fhe oufcome regressions: yS® = 
for g - AA, CA, NA and z = 0,1. 
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Define = \_agQ,a^p,af ^Y, g - AA,NA, and >8^,^ = g - AA,CA,NA- 

z = 0, 1. By Assumption 6, /3na,o - ^na,i- We assume that parameters are a priori inde¬ 
pendent and use multivariate normal prior distributions: 

where / is the identity matrix. We specify fiat priors setting the hyper-parameters as fol¬ 
lows: setting p , ^ to be null vectors; and setting large prior variances = 10, 

= 10, = 10 for g = AA, CA,NA\ z = 0,1. 


5.4. Posterior computation. Details of the MCMC algorithm for the posterior compu¬ 
tation based on the outline in Section 4 are given in Web Supplementary Material. Upon 
obtaining the posterior draws of the parameters, we calculate three estimates for each 
causal estimand: population-average effect within and at ^'o^ and sample-average ef¬ 
fect within tlsQ- The population-average effects within 14so are calculated averaging the 
model-based dropout proportions over the empirical distribution of the pre-assignment 
variables and the forcing variable: 




,o(^o,g,i +fg}S* + X;;8m) 


+ 


x;;8w) 




^i,g 


for g - AA, CA, {AA, CA}. The population-average effects at sq are calculated in a similar 
way setting 5* = 0 (i.e.. Si - sq) for each i. To obtain the sample-average estimates, we 
compute the posterior predictive distributions of the potential outcomes for each student i 
in 14so, based on which the sample average is calculated. 


5.5. Results. We conducted Bayesian analysis using h = 500,1 000,1 500. Posterior 
inference is based on 5 000 draws from the posterior distributions simulated using single 
chains, which were run for 125 000 iterations. To assess convergence of iterative sim¬ 
ulation methods, we calculated the Cramer-von-Mises statistic to test the null hypothe¬ 
sis that the sampled values come from a stationary distribution and visual inspected the 
trace-plots of the causal parameters (functions of model parameters). We also run multiple 
MCMC chains with different starting for each h to evaluate the mixing of the chains using 
the Gelman-Rubin statistic (Gelman and Rubin, 1992). The results provided no evidence 
against convergence^. 

*We also conducted Bayesian analysis using alternative models with different order polynomials in S as 
well as models conditioning only on 5 (without using the pretreatment variables) and null models, condi¬ 
tioning on neither S nor the pre-treatment covariates. Consistently to results found in Mealli and Rampichini 
(2012), higher order polynomials do not lead to substantial inferential benefits, and posterior distributions of 
the causal effects of interest did not substantially change with the alternative models, so here we only show 
the results based on models conditioning on both 5 and the pre-treatment covariates. 
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Table 3 shows posterior medians and 95% credible intervals for the principal strata 
proportions under monotonicity and for the causal parameters taa,so, tca,so’ for band- 
widths ranging from 500 to 1500 euros. The results are robust across different bandwidths. 
The estimated proportions of the principal strata are very similar across different h\ there 
are more than 61% never-applicants, more than 32% always-applicants and less than 6.5% 
compliant-applicants. The three estimates for the same causal parameter are also similar. 
The posterior distributions of the causal effect for always-applicants, taa,so’ •^he union 
of always-applicants and compliant-applicants, t^q, are centered on negative values, and 
the 95% credible intervals do not cover 0, irrespective of the choice of the bandwidth. 

For instance, consider the finite-sample causal effects for the subpopulation within 
h - \ 000 euros around the threshold (middle block of columns in Table 3). The esti¬ 
mated Tig suggests a 13.9% (95% Cl: (3.4%; 24.7%)) reduction in dropout rate for the 
students who receive the grants. The estimated taaaq suggests an even stronger positive 
effect among the always-applicants: a 16.1% (95% Cl: (5%; 27%)) reduction in dropout 
rate. In fact, which is a weighted average of the effects for always-applicants and 
compliant-applicants, appears to be diluted by the somewhat surprising small effect among 
the compliant-applicant. However, the data do not seem to contain much information on 
compliant-applicants (the estimated proportion of compliant-applicants is very small, less 
than 5%), and the effects were estimated with large uncertainties. 

These results suggest that the current Italian university grants are effective in reducing 
dropout from universities among students from families with annual economic measure 
around 15 000 euros. Our analysis also reveals some additional information for policy 
making. Specifically, always-applicants and compliant-applicants are found to be hetero¬ 
geneous with respect to the effect of the grants. The causal effect for compilers, , usually 
estimated in a standard IV analysis that ignores the application information, is attenu¬ 
ated by the small (and negative) effect estimated for the small proportion of compliant- 
applicants. From a cost-effective perspective, it appears more beneficial for education ad¬ 
ministrations to lower the eligibility criteria (i.e., decrease the threshold ^o) to allow more 
applicants to get the grant, than to increase the amount of the grant to awardees. The combi¬ 
nation of low percentage of compliant-applicants and high percentage of always-applicants 
suggests that most students with the economic measure being around the threshold who 
intend to apply for the grants would apply irrespective of their eligibility. From a policy 
perspective, this implies that educational administrations should better explain the rule of 
eligibility to potential applicants to discourage ineligible students from applying, and thus 
reduce unnecessary efforts from these students and the administration, for processing these 
applications. 

5.6. Posterior Predictive Model Checking. Assessing the plausibility of model as¬ 
sumptions is critical in model-based approaches. Model checking here is not as crucial 
as in other model-based approaches thanks to the randomization assumption, but it is 
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Still prudent to check the model fit since there are uncertainties in the selection of tlsD- 
We adopt Bayesian posterior predictive checks (Gelman, Meng and Stem, 1996) to as¬ 
sess goodness-of-fit of our models in the application. Posterior predictive checks evaluate 
goodness-of-fit of models by measuring the discrepancy between the observed data and 
replicated data simulated from its posterior predictive distribution. The particular proce¬ 
dure adopted here is similar to that in Mattei, Li and Mealli (2013, Section 6). Specifically, 
we consider three discrepancy measures aim at assessing whether the model can preserve 
broad features of signal, noise and signal-to-noise ratio (SNR) in the drop-out status dis¬ 
tribution for compliant-applicants, always-applicants and the union of these two principal 
strata, and calculate posterior predictive p-values (PPPVs) to summarize discrepancies 
between the observed data and replicated data. Extreme (close to 0 or 1) PPPVs can be in¬ 
terpreted as evidence of lack-of-fit of the model in, at least some aspects of, the observed 
data. Further details of the procedure are relegated in Web Supplementary Material. 

Table 4 shows the PPPVs for the model-fit to the subpopulation with bandwidth of 500, 
1 000 and 1 500 euros, respectively. The PPPVs suggest good model-fit for all bandwidths, 
except for a slight under-fit for always-applicants in the subpopulation with h - 500, which 
is possibly due to the small sample size. We have also calculated the less conservative sam¬ 
pled posterior predictive p-values (Johnson, 2007; Gosselin, 2011) and obtained similar 
conclusions. 

6. Discussion. Motivated from the evaluation of Italian university grants, we propose 
a probabilistic formulation of the assignment mechanism for regression discontinuity de¬ 
signs and develop a full Bayesian approach to draw causal inference within the framework 
of principal stratification. In particular, we illustrate how to utilize information on appli¬ 
cation status to gain additional insights in program evaluation. Applying the method to 
the data from two Italian universities, we find university grants reduce dropping out of 
higher education for students from low-income families and the effect size is especially 
pronounced for motivated students (always-applicants). 

The core of the approach we propose is the local randomization assumption (Assump¬ 
tion 4), which is intrinsically non-testable. Therefore, it may be worthwhile to conduct 
sensitivity analyses aimed at assessing the robustness of the results with respect to vio¬ 
lations of the local randomization assumption. To this end, we conduct further analyses 
deriving the posterior distributions of the causal estimands of interest under three addi¬ 
tional model specifications: (1) a model where we specify the model for principal strata, 
Ki^g, and the conditional model for potential outcomes given principal strata, fi^g^, condi¬ 
tioning on neither the forcing variable nor the pre-treatment variables; (2) a model where 
we specify Tr,;^ and fi^g^ conditioning only on the forcing variable, without including the 
pre-treatment variables; and (3) a model where we specify 7r,- g and conditioning only 
on the pre-treatment variables, without including the forcing variable. Under local ran¬ 
domization, adjusting inference for either the forcing variable, S, or the pre-treatment 
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variables, X, should not be required, therefore we expect that results are similar across 
different model specifications. Indeed results, shown in Web Supplementary Material, are 
robust across different model specifications, suggesting that causal inference under the 
local randomization assumption is credible and fully defensible. 

A fundamental distinction between our approach and the previous local-regression based 
RD approaches lies in the role of the forcing variable in the analysis. Specifically, previous 
approaches generally view the forcing variable as a pre-assignment covariate rather than a 
random variable as in our approach. As a consequence, the standard overlap assumption, 
which requires that there are both treated and control units for all values of the covariates 
including the forcing variable, is violated. Violation of the overlap assumption implies that 
the conditional independence assumption, which trivially holds in RD settings, cannot be 
exploited directly. Instead some kind of extrapolation is required, and in order to avoid 
that estimates heavily rely on extrapolation, previous analyses focus on causal effects of 
the treatment for units at the threshold. Smoothness assumptions, for example, continuity 
of conditional regression functions of potential outcomes given the forcing variable, are 
usually assumed to draw inference on those causal effects. Local randomization and con¬ 
tinuity are different assumptions, leading to different causal estimands: under continuity 
assumptions units with a realized value of the forcing variable around the threshold are 
used to draw inference on causal effects for units at the threshold, whereas under local 
randomization, inference is drawn on causal effects for units around the threshold. 

In the evaluation of Italian university grants, other than dropout, student’s academic per¬ 
formance (measured by total credits taken or passing rate of exams) is also of great inter¬ 
est in policy. As illustrated by Mattel, Li and Mealli (2013) and Mercatanti, Li and Mealli 
(2014), jointly modeling two outcomes, dropout and academic performance in this case, 
would be worthwhile for both practical and inferential purposes, and it is at the top of our 
research agenda. 

After the first year, the Italian university grant assignment rule combines sequential and 
RD designs (Cellini, Ferreira and Rothstein, 2010): grants are allocated both on the basis 
of students family economic indicator and on the ground of their academic performance 
(exam scores above a certain threshold). Such complex assignment mechanisms pose chal¬ 
lenges to causal inference, requiring new structures and assumptions; meanwhile, they also 
present great opportunities for extending the existing framework to more general RD set¬ 
tings. One specific direction of our future research is to develop methods that combine 
Bayesian tools for RDs and dynamic treatment regimes (Murphy, 2003; Zajonc, 2012) in 
the presence of multiple forcing variables (Imbens and Zajonc, 2011). 
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SUPPLEMENTARY MATERIAE 

Web Supplementary Material: Details of Caculation and Sensitivity Analysis 

(http://lib.stat.cmu.edu/aoas/???/???). We describe in detail tbe Bayesian approach we used 
to select the subpopulations, the Markov Chain Monte Carlo (MCMC) methods used to 
simulate the posterior distributions of the parameters of the models, the posterior predictive 
checks, and the sensitivity analysis regarding local randomization described in Section 6. 
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Table 2. Posterior probabilities that the covariates have the same distribution between eligible and ineligible students for various subpopulation 


Variable 

h=250 

(«=528) 

/r=500 

(n=1042) 

h=150 

(n=1577) 

/!=1000 
(n=2 108) 

h=1500 
(n=3 166) 

h=2 000 
(n=4 197) 

h=2 500 
(n=5 159) 

h=3 000 
(n=6113) 

/j=4000 
(«=8 061) 

h=5 000 
(«=9 846) 

Sex 

.955 

.950 

.960 

.962 

.977 

.970 

.991 

.960 

.968 

.797 

High School Type (Baseline: Other) 









Humanity 

.951 

.952 

.949 

.955 

.979 

.970 

.965 

.986 

.953 

.962 

Science 

.894 

.905 

.926 

.927 

.951 

.889 

.916 

.926 

.045 

.000 

Tech 

.790 

.807 

.790 

.808 

.819 

.619 

.751 

.793 

.003 

.000 

HS Grade 

.955 

.958 

.972 

.978 

.971 

.981 

.987 

.990 

.984 

.986 

Year (Baseline: 2004) 










2005 

.932 

.964 

.954 

.926 

.973 

.977 

.976 

.983 

.861 

.918 

2006 

.883 

.918 

.914 

.909 

.959 

.934 

.952 

.970 

.807 

.884 

University (Pisa) 

.950 

.916 

.971 

.983 

.686 

.097 

.225 

.300 

.082 

.000 

Major in University (Baseline: Other) 









Humanity 

.946 

.899 

.689 

.797 

.798 

.932 

.958 

.990 

.964 

.946 

Science 

.894 

.857 

.660 

.751 

.783 

.901 

.929 

.966 

.911 

.913 

Social Science 

.798 

.821 

.624 

.713 

.758 

.864 

.913 

.953 

.878 

.858 

Bio-Med 

.728 

.776 

.604 

.677 

.136 

.837 

.889 

.926 

.839 

.832 

Tech 

.632 

.634 

.565 

.624 

.699 

.794 

.863 

.876 

.719 

.453 
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Table 3 

Posterior median and 95% credible intervals of principal strata proportion and super-population and 
finite-sample causal effects on dropout for always-applicants ( taa.so ), compliant-applicants ( tca.so )< tind their 
union (for the subpopulation within different bandwidths h around the threshold. 



Population-average 

Sample-average 

Population-average at so 

h 

Median 

95% Cl 

Median 

95% Cl 

Median 

95% Cl 

h = 500 

Pr(G, = AA) 

.323 

(.294; .355) 

.322 

(.309; .336) 

.320 

(.291;.352) 

Pr(G/ = CA) 

.060 

(.031;. 105) 

.041 

(.021;.090) 

.058 

(.030; .094) 

Pr(G, = NA) 

.616 

(.570; .650) 

.637 

(.590; .651) 

.621 

(.583; .654) 

'^AA,sq 

-.153 

(-.313;-.030) 

-.152 

(-.307;-.038) 

-.154 

(-.298; -.030) 

’’’CA.so 

.045 

(-.170; .497) 

.074 

(-.256; .545) 

.039 

(-.169; .474) 


-.116 

(-.253;-.005) 

-.120 

(-.265; -.009) 

-.120 

(-.245;-.012) 

h = 1 000 
Pr(G, = AA) 

.336 

(.312; .365) 

.333 

(.318; .354) 

.335 

(.311;.363) 

Pr(G, = CA) 

.043 

(.002; .086) 

.027 

(.002; .075) 

.043 

(.001;.075) 

Pr(G, = NA) 

.623 

(.584; .652) 

.640 

(.599; .645) 

.625 

(.594; .656) 


-.161 

(-.273;-.052) 

-.161 

(-.270;-.057) 

-.154 

(-.259;-.052) 

'^CA.so 

.028 

(-.745; .828) 

.031 

(-.778; .871) 

.010 

(-.918; .933) 

Tso 

-.132 

(-.242;-.021) 

-.139 

(-.247; -.034) 

-.128 

(-.229; -.020) 

h = 1 500 
Pr(G, = AA) 

.332 

(.315; .349) 

.332 

(.326; .337) 

.329 

(.312; .346) 

Pr(G/ = CA) 

.042 

(.035; .077) 

.027 

(.020; .066) 

.042 

(.036; .062) 

Pr(G, = A^A) 

.625 

(.591;.642) 

.642 

(.605; .644) 

.628 

(.606; .646) 

'^AA,sq 

-.183 

(-.286; -.077) 

-.187 

(-.291;-.085) 

-.153 

(-.247; -.063) 

’’’CA.so 

.010 

(-.304; .797) 

.011 

(-.207; .928) 

.000 

(-.154; .951) 


-.153 

(-.256; -.040) 

-.165 

(-.266; -.057) 

-.130 

(-.217;-.019) 


Table 4 

Bayesian p—values of signal, noise and SNR under different hfor the model used in the application to Italian 

university grants. 


h 

Principal strata 

Signal 

Noise 

SNR 


ICA) 

.095 

.630 

.094 

500 

|AA) 

.254 

.325 

.254 


|AA,CA) 

.338 

.273 

.370 


ICA} 

.411 

.425 

.419 

1000 

(AA) 

.400 

.444 

.473 


|AA,CA1 

.493 

.335 

.518 


ICA} 

.208 

.444 

.210 

1500 

|AA} 

.372 

.400 

.261 


|AA,CA} 

.455 

.337 

.470 















